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O { Abstract 



The quantum query complexity of evaluating any read-once formula with n black-box input 
bits is 0(y / n). However, the corresponding problem for read-many formulas (i.e., formulas in 
which the inputs have fanout) is not well understood. Although the optimal read-once formula 
evaluation algorithm can be applied to any formula, it can be suboptimal if the inputs have large 
fanout. We give an algorithm for evaluating any formula with n inputs, size S, and G gates 
using 0(min{n, y/S, n 1 /^ 1 / 4 }) quantum queries. Furthermore, we show that this algorithm is 
optimal, since for any n, S, G there exists a formula with n inputs, size at most S, and at most G 
gates that requires J7(min{n, \/S, n 1 / 2 G 1 / 4 }) queries. We also show that the algorithm remains 
nearly optimal for circuits of any particular depth k > 3, and we give a linear-size circuit of 
. depth 2 that requires f2(n 5 / 9 ) queries. Applications of these results include a J7(n 19 / 18 ) lower 

bound for Boolean matrix product verification, a nearly tight characterization of the quantum 
query complexity of evaluating constant-depth circuits with bounded fanout, new formula gate 
count lower bounds for several functions including parity, and a construction of an AC circuit 
of linear size that can only be evaluated by a formula with Sl(n 2_e ) gates. 
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^ ■ 1 Introduction 
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A major problem in query complexity is the task of evaluating a Boolean formula on a black-box 
input. In this paper, we restrict our attention to the standard gate set {and, OR, not}. A formula 
is a rooted tree of NOT gates and unbounded-fanin AND and OR gates, where each leaf represents 
an input bit and each internal vertex represents a logic gate acting on its children. The depth of a 
formula is the length of a longest path from the root to a leaf. 

The quantum query complexity of evaluating read-once formulas is now well understood. (In this 
paper, the "query complexity" of / always refers to the bounded-error quantum query complexity, 
denoted Q(f)-) A formula is read-once if each input bit appears at most once in it. Grover's 
algorithm shows that a single OR gate on n inputs can be evaluated in 0(y/n) queries [18], which 
is optimal [9]. It readily follows that balanced constant-depth read-once formulas can be evaluated 
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in 0(y/n) queries [12] (where we use a tilde to denote asymptotic bounds that neglect logarithmic 
factors), and in fact 0(y/n) queries are sufficient [19]. A breakthrough result of Farhi, Goldstone, 
and Gutmann showed how to evaluate a balanced binary AND-OR formula with n inputs in time 
0(y/n) [16] (in the Hamiltonian oracle model, which easily implies an upper bound of n l l 2+ °^ 
queries [14]). Subsequently, it was shown that any read-once formula whatsoever can be evaluated 
in n l / 2+ °^ queries [3], and indeed 0(^/n) queries suffice [27]. This result is optimal, since any 
read-once formula requires Cl(y/n) queries to evaluate [5]. 

Now that the quantum query complexity of evaluating read-once formulas is tightly charac- 
terized, it is natural to consider the query complexity of evaluating more general formulas, which 
we call "read-many" formulas to differentiate them from the read-once case. Note that we cannot 
expect to speed up the evaluation of arbitrary read-many formulas on n inputs, since such formulas 
are capable of representing arbitrary functions, and some functions, such as parity, require as many 
quantum queries as classical queries up to a constant factor [7, 17]. Thus, to make the task of 
read-many formula evaluation nontrivial, we must take into account other properties of a formula 
besides the number of inputs. 

Two natural size measures for formulas are formula size and gate count. The size of a formula, 
which we denote by S, is defined as the total number of inputs counted with multiplicity (i.e., if an 
input bit is used k times it is counted as k inputs). The gate count, which we denote by G, is the 
total number of AND and OR gates in the formula. (By convention, NOT gates are not counted.) 
Note that G < S since for a given value of S, G is largest when the formula is a binary tree with S 
leaves, but the number of internal vertices in a binary tree is one less than the number of leaves. 

A formula of size S can be viewed as a read-once formula on S inputs by neglecting the fact 
that some inputs are identical, thereby giving a query complexity upper bound from the known 
formula evaluation algorithms. Thus, an equivalent way of stating the results on the evaluation of 
read-once formulas that also applies to read-many formulas is the following: 

Theorem 1 (Formula evaluation algorithm [27, Corollary 1.1]). The bounded- error quantum query 
complexity of evaluating a formula of size S is O(VS). 

However, this upper bound does not exploit the fact that some inputs may be repeated, so it can 
be suboptimal for formulas with many repeated inputs. We study such formulas and tightly char- 
acterize their query complexity in terms of the number of inputs n (counted without multiplicity), 
the formula size S, and the gate count G. By preprocessing a given formula using a combination 
of classical techniques and quantum search before applying the algorithm for read-once formula 
evaluation, we show that any read-many formula can be evaluated in (9(min{n, y/S, n 1 / 2 ^? 1 / 4 }) 
queries (Theorem 2). Furthermore, we show that for any n,S,G, there exists a read-many for- 
mula with n inputs, size at most S, and at most G gates such that any quantum algorithm needs 

queries to evaluate it (Theorem 3). We construct these formulas by care- 
fully composing formulas with known quantum lower bounds and then applying recent results on 
the behavior of quantum query complexity under composition [26, 27]. 

To refine our results on read- many formulas, it is natural to consider the query complexity of 
formula evaluation as a function of the depth k of the given formula in addition to its number of 
input bits n, formula size S, and gate count G. Beame and Machmouchi showed that constant- 
depth formulas (in particular, formulas of depth k = 3 with size <3(n 2 )) can require (l(n) queries 
[8]. However, to the best of our knowledge, nothing nontrivial was previously known about the case 
of depth-2 formulas (i.e., CNF or DNF expressions) with polynomially many gates, or formulas of 
depth 3 and higher with gate count o(n 2 ) (in particular, say, with a linear number of gates). 

Building upon the results of [8], we show that the algorithm of Theorem 2 is nearly optimal 
even for formulas of any fixed depth k > 3. While we do not have a tight characterization for 
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depth-2 formulas, we improve upon the trivial lower bound of 0(y / n) for depth-2 formulas of linear 
gate count (a case arising in an application), giving an example of such a formula that requires 
queries (Corollary 1). It remains an open question to close the gap between 
this lower bound and the upper bound of 0(n 0,75 ) provided by Theorem 2, and in general, to better 
understand the quantum query complexity of depth-2 formulas. 

Aside from being a natural extension of read-once formula evaluation, read-many formula eval- 
uation has potential applications to open problems in quantum query complexity. For example, the 
graph collision problem [24] can be expressed as a depth-2 read-many formula of quadratic gate 
count. 

More concretely, we apply our results to better understand the quantum query complexity 
of Boolean matrix product verification. This is the task of verifying whether AB = C, where 
A, B, C are n x n Boolean matrices provided by a black box, and the matrix product is computed 
over the Boolean semiring, in which OR plays the role of addition and and plays the role of 
multiplication. Buhrman and Spalek gave an upper bound of 0(n 1,5 ) queries for this problem, and 
their techniques imply a lower bound of fi(n) [13]. We improve the lower bound to Q(n 19 / 18 ) = 
f2(n L055 ) (Theorem 6), showing in particular that linear query complexity is not achievable. 

Our results can be viewed as a first step toward understanding the quantum query complexity 
of evaluating general circuits. A circuit is a directed acyclic graph in which source vertices represent 
black-box input bits, sink vertices represent outputs, and internal vertices represent AND, OR, and 
not gates. The size of a circuit is the total number of and and OR gates, and the depth of a circuit 
is the length of a longest (directed) path from an input bit to an output bit. 

The main difference between a formula and a circuit is that circuits allow fanout for gates, 
whereas formulas do not. In a circuit, the value of an input bit can be fed into more than one gate, 
and the output of a gate can be fed into more than one subsequent gate. A read-once formula is 
a circuit in which neither gates nor inputs have fanout. A general formula allows fanout for the 
inputs, but not for the gates. Note that by convention, the size of a circuit is the number of gates, 
whereas the size of a formula is the total number of inputs counted with multiplicity, so one must 
take care to avoid confusion. For example, the circuit that has 1 OR gate with n inputs has circuit 
size 1, formula size n, and formula gate count 1. To help clarify this distinction, we consistently 
use the symbol S for formula size and the symbol G for both formula gate count and circuit size. 

As another preliminary result on circuit evaluation, we provide a nearly tight characterization of 
the quantum query complexity of evaluating constant-depth circuits with bounded fanout. We show 
that any constant-depth bounded-fanout circuit of size G can be evaluated in 0(min{n, n 1/,2 G 1//4 }) 
queries, and that there exist such circuits requiring f2(min{n, n 1 / 2 ^? 1 / 4 }) queries to evaluate. 

Finally, our results on the quantum query complexity of read-many formula evaluation have 
two purely classical applications. First, we give lower bounds on the number of gates (rather than 
simply the formula size) required for a formula to compute various functions. For example, while it is 
known that parity requires a formula of size Q(n 2 ) [21], we show that in fact any formula computing 
parity must have f2(n 2 ) gates, which is a stronger statement. We also give similar results for several 
other functions. Second, for any e > 0, we give an example of an explicit constant-depth circuit of 
linear size that requires Q(n 2 ~ e ) gates to compute with a formula of any depth (Theorem 8). 

The remainder of this paper is organized as follows. In Section 2 we present the algorithm for 
evaluating read-many formulas, and in Section 3 we present a matching lower bound. In Section 4 
we study the quantum query complexity of evaluating constant-depth formulas and constant-depth 
bounded-fanout circuits. We present applications of these results in Section 5: the lower bound 
for Boolean matrix product verification appears in Section 5.1, and results on classical circuit 
complexity appear in Section 5.2. Finally, we conclude in Section 6 with a discussion of the results 
and some open problems. 
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2 Algorithm 



In this section, we describe an algorithm for evaluating any formula on n inputs with G gates 
that makes 0{n x l 2 G x ^) queries to the inputs. When G = Q(n 2 ), the upper bound is larger 
than n, so it is more efficient to simply read the entire input instead. Similarly, if n l / 2 G l / A is 
larger than yS, then it is preferable to use Theorem 1. Thus, overall, we find an upper bound of 
0(min{n, \^S, n 1 / 2 ^ 1 / 4 }) queries. 

In the next section we show that this algorithm is optimal in the sense that there exists a formula 
with n inputs, size at most S, and at most G gates that cannot be evaluated asymptotically faster. 
Note that this does not mean that all formulas with given values of n,S,G have the same query 
complexity; formulas with appropriate structure can sometimes be evaluated much faster. As an 
extreme example, the formula x± V x± V • • • V x n V x n is the constant function 1, so it requires no 
queries to evaluate. 

We first describe the algorithm at a high level. We wish to evaluate a formula / with n inputs 
and G gates, using at most C^n^C 1 / 4 ) queries. Consider directly applying the formula evaluation 
algorithm (Theorem 1). If the bottommost gates of / have a large fanin, say O(n), then the formula 
size can be as large as £l(nG). If we apply the formula evaluation algorithm to this formula, we 
will get an upper bound of 0(\/nG), which is larger than claimed. So suppose that the formula 
size of / is large. 

Since the formula size is large, there must be some inputs that feed into a large number of gates, 
i.e., inputs with high degree. Among the inputs that feed into many OR gates, if any input is 1, 
then this immediately fixes the output of a large number of OR gates, reducing the formula size. A 
similar argument applies to inputs that are and feed into AND gates. Thus the first step in our 
algorithm is to find high-degree inputs and eliminate them, reducing the formula size considerably. 
Using at most 0(n 1 / 2 G 1/4 ) queries, we show how to eliminate enough high-degree inputs that the 
resulting formula has formula size 0{n\fG). Then we use Theorem 1 on the resulting formula to 
achieve the claimed bound. 

More precisely, our algorithm converts a formula / on n inputs and G gates into another formula 
/' of size nVG on the same input. The new formula /' has the same output as / (on the given 
input), and this stage makes 0{n l l 2 G l l 4 ) queries. We call this the formula pruning algorithm. 

Before explaining the algorithm, we need a subroutine to find a marked entry in a string of 
length n, with good expected performance when there are many marked entries. One can find a 
marked entry with 0(y/n/t) expected queries where t is the number of marked items, even when 
the value t is not known [11]: 

Lemma 1. Given an oracle for a string x G {0, l} n , there exists a quantum algorithm that out- 
puts with high probability the index of a marked item in x if it exists, making O(^nft) queries 
in expectation when there are t marked items. If there are no marked items the algorithm runs 
indefinitely. 

We are now ready to state our formula pruning algorithm. 

Lemma 2 (Formula pruning algorithm). Given a formula f with n inputs and G gates, and an 
oracle for the input x, there exists a quantum algorithm that makes 0(n 1 / 2 G 1 ^ 4 ) queries and returns 
a formula f on the same input x, such that f'{x) = f(x) and f has formula size 0(n\/G). 

Proof. First consider the set of all inputs that feed into OR gates. From these inputs and OR gates, 
we construct a bipartite graph with the inputs on one side and OR gates on the other. We put 
an edge between an input bit and all of the OR gates that take it as input. An input is called a 
high- degree input if it has degree greater than yG. 
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Now repeat the following process. First, use Lemma 1 to find any marked high-degree input. 
All OR gates connected to this marked input have output 1 , so we delete these gates and their input 
wires and replace the gates with the constant 1. This input is removed from the set of inputs for 
the next iteration. If all the OR gates have been deleted or if there are no more high-degree inputs, 
the algorithm halts. 

Say the process repeats k — 1 times and in the k th round there are no remaining marked high- 
degree inputs, although there are high-degree inputs. Then the process will remain stuck in the 
search subroutine (Lemma 1) in the /c th round. Let the number of marked high-degree inputs at 
the j th iteration of the process be rrij. Note that rrij > rrij+i since at least 1 high-degree input is 
eliminated in each round. However it is possible that more than 1 high-degree input is eliminated 
in one round since the number of OR gates reduces in each round, which affects the degrees of the 
other inputs. Moreover, in the last round there must be at least 1 marked item, so m&_i > 1. 
Combining this with rrij > wij+i, we get rrik-r > r - 

During each iteration, at least y/G OR gates are deleted. Since there are fewer than G OR gates 
in total, this process can repeat at most \[G times before we learn the values of all the OR gates, 
which gives us k < \[G + 1. 

In the j th iteration, finding a marked input requires 0{\Jn/rrij) queries in expectation by 
Lemma 1. Thus the total number queries made in expectation until the /c th round, but not counting 
the /c th round itself, is 

£° {J^) ± E° (yf) * * otf'w*). (i) 

So in expectation, O^/^G 1 / 4 ) queries suffice to reach the k th round, i.e., to reach a stage where 
there are no high-degree marked inputs. To get an algorithm with worst-case query complexity 
(^(n 1 / 2 ^? 1 / 4 ), we simply halt this algorithm after it has made some constant times its expected 
number of queries. This gives a bounded-error algorithm with the same worst-case query complexity. 

Next, we repeat the same process with and gates while searching for high-degree inputs that 
are zero. This also requires the same number of queries. At the end of both these steps, each input 
has at most vG outgoing wires to OR gates and at most \[G outgoing wires to AND gates. This 
yields a formula /' of size 0{n\J~G) on the same inputs. □ 

Combining this lemma with the formula evaluation algorithm gives the following. 

Theorem 2. The bounded- error quantum query complexity of evaluating a formula with n inputs, 
size S, and G gates is 0(min{n, y/S, n 1//2 G 1//4 }). 

Proof. We present three algorithms, with query complexities O(n), 0(VS), and 0(?i 1 / 2 G 1 / 4 ), which 
together imply the desired result. Reading the entire input gives an 0(n) upper bound, and 
Theorem 1 gives an upper bound of 0(y/~S). Finally, we can use Lemma 2 to convert the given 
formula to one of size 0{ny/G), at a cost of 0(n x l 2 G x l 4 ) queries. Theorem 1 shows that this 
formula can be evaluated using queries. □ 

Let us return to the observation that our algorithm does better than the naive strategy of 
directly applying Theorem 1 to the given formula with G gates on n inputs, since its formula size 
could be as large as 0{nG), yielding a sub-optimal algorithm. Nevertheless, one might imagine 
that for every formula with G gates on n inputs, there exists another formula /' that represents the 
same function and has formula size n\J~G. This would imply Theorem 2 directly using Theorem 1. 



5 



However, this is not the case: there exists a formula with G gates on n inputs such that any 
formula representing the same function has formula size Q(nG/ log n). This shows that in the worst 
case, the formula size of such a function might be close to 0(nG). 

Proposition 1. There exists a function f that can be represented by a formula with G gates on n 
inputs, and any formula representing it must have formula size S = VL(nG/\ogn). 

Proof sketch. The proof is a counting argument. The total number of formulas of size S is at most 
exp(0(51ogn)), while there are at least exp(f2(nG)) distinct functions with G-gate formulas. Thus 
S = n(nG/logn). □ 

3 Lower bounds 

We now turn to lower bounds on the quantum query complexity of evaluating read-many formulas. 
We begin by proving a basic lemma on the query complexity of circuits obtained by composition, 
and then use this lemma (together with known quantum lower bounds) to show that the algorithm 
of Theorem 2 is optimal. 

3.1 Composition 

Recent work on the quantum adversary method has shown that quantum query complexity behaves 
well with respect to composition of functions [26, 27]. Here we apply this property to characterize 
the query complexity of composed circuits. By composing circuits appropriately, we construct a 
circuit whose depth is one less than the sum of the depths of the constituent circuits, but whose 
query complexity is still the product of those for the constituents. This construction can be used 
to give tradeoffs between circuit size and quantum query complexity: in particular, it shows that 
good lower bounds for functions with large circuits can be used to construct weaker lower bounds 
for functions with smaller circuits. 

First we note some simple transformations that can be applied to a given circuit. Without 
decreasing the query complexity or increasing the depth, and at the cost of at most doubling the 
number of inputs and gates, we can assume that the given circuit is monotone (i.e., consists only 
of AND and OR gates, with no not gates), and its topmost gate is a gate of our choosing (either 
AND or OR, as desired). 

Lemma 3. Let f be a circuit with nf inputs, having depth k and size G. Then there is a monotone 
circuit f with 2nj inputs, size at most 2G, depth k, and a topmost gate either AND or OR (as 
desired), such that Q(f) < Q(f'). Furthermore, if f is a formula, f is also a formula of the same 
size. 

Proof. First observe that any not gates in the circuit / can be pushed to the inputs using De 
Morgan's laws. This at most doubles the number of gates. Then, if / has input variables x\, . . . , x nf , 
let /' have the 2nf inputs x±,xi,... ,x nf ,x nf , so that it is unnecessary to apply NOT gates to the 
inputs. Now Q(f) < Q(f') because any algorithm for /' can be converted to an algorithm for / 
using the same number of queries. 

To switch the output gate from AND to OR or vice versa, simply consider the negation of /. 
The resulting function has the same query complexity, but the output gate is switched. □ 

Now we are ready to prove the composition lemma. 
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Lemma 4. Let f be a circuit with rif inputs, having depth kf and size Gj\ and let g be a circuit 
with n g inputs, having depth k g and size G g . Then there exists a circuit h with = ^nfn g inputs, 
having depth k^ = kf + k g — 1 and size G\ t < 2Gf + ArifG g , such that Q(h) = Q(Q(f)Q(g))- 
Furthermore, if f is a formula and k g = 1, then h is a formula of size Sh = SfS g . 

Proof. By Lemma 3, we can assume that / and g are monotone at the cost of replacing nt by 
2n/, Gf by 2Gf, n g by 2n g , and G g by 2G g , but with no change to the depths of the circuits. 
Furthermore, we can assume without loss of generality that the gates of / at level kf are of the 
same type as the top gate of g. These assumptions ensure that when we compose / with g, the top 
gate of g can be merged with the gates of / at level kf, since all such gates are of the same type. 
Now let h = f o (<?, . . . , g) be the composition of / with rif copies of g, i.e., 

h(xi, . . . , X n ^ ng ) = f {jj(x\, . . . , X Ug ), . . . , g{x n f ng — ng -\- 1 , ■ ■ ■ , 2?n/7i g )) • (2) 

By combining adjacent gates of the same type, we have kh = kf + k g — 1. The expressions for the 
number of inputs and the size Gh are immediate. 

Theorem 1.5 of [27] shows that the quantum query complexity of the composed function is 
simply the product of the individual query complexities, up to some constant factor. 

If k g = 1, then g is simply an AND or OR gate, so it clearly can be composed with a formula / 
to give a formula h. Since each of the Sf inputs of / (counted with multiplicity) gives rise to S g 
inputs of g (again, counted with multiplicity), the size of this formula is simply Sh = SfS g . □ 

Observe that in general, the fanout of most gates in h is inherited from the corresponding gates 
in / and g, except that the gates at level kf of / are combined with the top gate of g. For example, 
if / is a read-once formula and g is a formula (or if / is a formula and k g = 1 as considered above), 
then h is a formula. 

3.2 Optimality of Theorem 2 

In this section we give examples of functions for which the algorithm of Theorem 2 is optimal. We 
obtain such functions by composing a formula for the PARITY function with AND gates. 

It is well known that the parity of n bits, denoted PARiTY n , can be computed by a formula of 
size 0(n 2 ). An explicit way to construct this formula is by recursion. The parity of two bits x and 
y can be expressed by a formula of size 4: x © y = (x A y) V {x A y). When n is a power of 2, given 
formulas of size n 2 /4 for the parity of the first and second half of the input, we get a formula of 
size n 2 . When n is not a power of 2 we can use the next largest power of 2 to obtain a formula size 
upper bound. Thus parity has a formula of size 0(n 2 ). Consequently, the number of gates in this 
formula is 0(n 2 ). 

This observation combined with Lemma 4 and known quantum lower bounds for parity gives 
us the following theorem. 

Theorem 3. For any n, S, G, there is a read-many formula with n inputs, size at most S, and at 
most G gates with bounded-error quantum query complexity Q(mm{n, y/~S, n 1 / 2 G 1 / 4 }). 

Proof. If min{n, y/S, n 1 / 2 G 1/ ' 4 } = n (i.e., S > n 2 and G > n 2 ), then consider the parity function, 
which has Q(PARiTY n ) = Q(n) [7, 17]. Since the formula size and gate count of parity are 0(n 2 ), 
this function has formula size O(S) and gate count 0(G). By adjusting the function to compute 
the parity of a constant fraction of the inputs, we can ensure that the formula size is at most S 
and the gate count is at most G with the same asymptotic query complexity. 
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In the remaining two cases, we obtain the desired formula by composing a formula for parity 
with AND gates. We apply Lemma 4 with / = PARiTY m and g = AND n / m for some choice of m. 
The resulting formula has 0(n) inputs, size 0(m (n/m)) = 0(nm), and gate count 0(m 2 ). Its 
quantum query complexity is Q(m\Jn/m) = Q(^/nm). 

If min{n, y/S, n 1//2 G 1//4 } = y/S (i.e., S < n 2 and S < ny/G), let m = S/n. Then the formula 
size is O(S) and the gate count is 0(S 2 /n 2 ) < 0(G). By appropriate choice of constants, we can 
ensure that the formula size is at most S and the gate count is at most G. In this case, the query 
complexity is Q(y/S). 

Finally, if min{n, y/S, n l / 2 G l l 4 } = n^ 2 G 1 ^ (i.e., G < n 2 and S > ny/G), let m = y/G. Then 
the gate count is 0(G) and the formula size is 0(n\/G) < O(S). Again, by appropriate choice of 
constants, we can ensure that the formula size is at most S and the gate count is at most G. In 
this final case, the query complexity is f2(n 1//2 G 1//4 ). □ 

4 Constant-depth formulas 

Theorem 2 and Theorem 3 together completely characterize the quantum query complexity of for- 
mulas with n inputs, formula size S, and gate count G. However, while there exists such a formula 
for which the algorithm is optimal, particular formulas can sometimes be evaluated more efficiently. 
Thus it would be useful to have a finer characterization that takes further properties into account, 
such as the depth of the formula. In this section we consider formulas of a given depth, number of 
inputs, size, and gate count. 

Since the algorithm described in Section 2 works for formulas of any depth, we know that any 
depth-fc formula with G gates can be evaluated with 0(min{n, y/S, n 1//2 G 1 / 4 }) queries. However, 
the lower bound in Section 3 uses a formula with non-constant depth. Here we focus on proving 
lower bounds on constant-depth formulas. 

Consider the onto function (defined in [8]), which has a depth-3 formula and has nearly maxi- 
mal query complexity. For any positive even integer n, let X n be the set of functions from [2n — 2] 
to [n]. Then the function onto: X n — > {0, 1} has ONTO(/) = 1 iff / is surjective. To view this 
as a Boolean function, we can encode the In — 2 values f(i) G [n] in binary, giving a function of 
N = (2n — 2) log n bits. We sometimes use a subscript to indicate the number of input bits, writing 
ONTOyv where iV = (2n — 2) logn. 

Beame and Machmouchi showed that the query complexity of the ONTO function is linear in 
the size of the range, i.e., nearly linear in the number of input bits: 

Proposition 2 (Corollary 6 of [8]). Q{onto n ) = Q(N/logN). 

Furthermore, onto has a simple depth-3 formula of size 0(n 2 logn) = 0(iV 2 ), namely [8] 

log 2 n— 1 

onto(/) = A V A fWj ( 3 ) 

je[n]ie[2n-2] i=0 

where f(i)e is the ^ th bit in the binary encoding of f(i), ji is the £ th bit in the binary encoding of 
j, and x b is x if b = 1 and x if b = 0. 

Now using the onto function instead of parity in the proof of Theorem 3, we get the following. 

Theorem 4. For any N, S, G, there is a depth-3 formula with N inputs, size at most S, and at 
most G gates with quantum query complexity 0(min{iV, y/S, iV 1 / 2 ^ 1 / 4 }). 
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This gives us a matching lower bound, up to log factors, for any k > 3. Thus we have a nearly 
tight characterization of the query complexity of evaluating depth-fc formulas with n inputs, size 
S, and G gates for all k > 3. Furthermore, since any depth-1 formula is either the AND or OR of 
a subset of the inputs, the query complexity of depth-1 formulas is easy to characterize. Thus we 
have a characterization for all depths other than depth 2, and it remains to consider the depth-2 
case. Since all depth-2 circuits are formulas, we will refer to depth-2 formulas as depth-2 circuits 
henceforth and use circuit size (i.e., the number of gates) as the size measure. 

There are also independent reasons for considering the query complexity of depth-2 circuits. 
Improved lower bounds on depth-2 circuits of size n imply improved lower bounds for the Boolean 
matrix product verification problem. We explain this connection and exhibit new lower bounds for 
the Boolean matrix product verification problem in Section 5.1. 

Furthermore, some interesting problems can be expressed with depth-2 circuits, and improved 
upper bounds for general depth-2 circuits would give improved algorithms for such problems. For 
example, the graph collision problem [24] for a graph with G edges can be written as a depth-2 
circuit of size G. Below, we exhibit a lower bound for depth-2 circuits that does not match the 
upper bound of Theorem 2. If there exists an algorithm that achieves the query complexity of the 
lower bound, this would improve the best known algorithm for graph collision, and consequently 
the triangle problem [24]. 

We begin with lower bounds for depth-2 circuits. The trivial lower bound for depth-2 circuits 
is Q.(y/n) due to the OR function. In this section we obtain better lower bounds for depth-2 circuits 
using the element distinctness problem. 

In the element distinctness problem, we are given a string x\X2 ■ ■ ■ x n £ [n] n of length n over 
an alphabet of size n, and we are asked if there are two positions in the string that are equal, 
i.e., whether there exist i,j E [n] with i ^ j such that Xi = Xj. Solving this problem requires 
n(n 2 / 3 ) [1, 2, 23] queries to an oracle that returns X{ when queried with i. 

To express this problem as a Boolean function, we represent the n inputs in binary using log n 
bits. The size of the input is N = nlogn; in terms of N, the lower bound is 0((iV/ log iV) 2 / 3 ). 
Observe that the element distinctness problem can be represented by a depth-2 circuit of size 0(n 3 ). 
One way to see this is by noting that we can check the condition Xj = xj = k for any i,j,k £ [n] 
using 1 and gate and some not gates. Then we just take the OR of i^)n = 0(n 3 ) such clauses to 
check whether two distinct inputs map to the same output. This immediately gives the following: 

Theorem 5. There exists a depth-2 circuit of size 0((N/ log N) 3 ) that requires 
quantum queries to evaluate. 

For the application to Boolean matrix product verification, we need lower bounds on depth-2 
circuits with n gates. Such a lower bound is easy to obtain from Theorem 5 using Lemma 4: 

Corollary 1. There exists a depth-2 circuit on n inputs of size n that requires £7(n 5 / 9 ) = ri(n 0,555 ) 
quantum queries to evaluate. 

Proof. Let / be the element distinctness function. We know that kf = 2 and Gf = 0(n 3 ). Let g be 
the AND function, which has k g = 1 and G g = 1. Composing these functions using Lemma 4 gives 
a depth-2 circuit h with = 4n/n 9 , Gh = 0(n 3 ), and Q(h) = Q(n 2 ^ 3 ^/n^). Choosing nj = njf S 

and rig = n 2 ^ 3 , we have G^ = 0{nh) an( A Q(h) = fi(n 5 / 9 ). Adjusting nj and n g by logarithmic 
factors, we can set Gh = with only a logarithmic adjustment to the query complexity. □ 

The results of this section also allow us to characterize (up to log factors) the quantum query 
complexity of bounded-fanout AC circuits, which we call AC 9 circuits. AC° circuits are like AC 
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circuits where the gates are only allowed to have O(l) fanout, as opposed to the arbitrary fanout 
that is allowed in AC . Note that as complexity classes, AC and AC are equal, but the conversion 
from an AC circuit to an AC^ circuit will in general increase the circuit size. 

The reason that formula size bounds apply to AC circuits is that an AC circuit can be 
converted to formula with a constant factor increase in size. (However, this constant depends 
exponentially on the depth.) Thus we have the following corollary: 

Corollary 2. Any language L in AC has quantum query complexity 0(min{n, n 1//2 G 1 / 4 }) where 
G(n) is the size of the smallest circuit family that computes L. Furthermore, for every G{n), 
there exists a language in AC that has circuits of size G(n) and that requires 
quantum queries to evaluate. 

5 Applications 

5.1 Boolean matrix product verification 

A decision problem closely related to the old and well-studied matrix multiplication problem is 
the matrix product verification problem. This is the task of verifying whether the product of two 
matrices equals a third matrix. The Boolean matrix product verification (bmpv) problem is the 
same problem where the input comprises Boolean matrices and the matrix product is performed 
over the Boolean semiring, i.e., the "sum" of two bits is their logical OR and the "product" of two 
bits is their logical AND. 

More formally, the input to the problem consists of three n x n matrices A, B and C. We have 
to determine whether = \J k A^ A B^j for all i, j E [n]. Note that the input size is 3ra 2 , so 0(n 2 ) 
is a trivial upper bound on the query complexity of this problem. Buhrman and Spalek [13] show 
that BMPV can be solved in 0(n 3//2 ) queries. This bound can be obtained by noting that checking 
the correctness of a single entry of C requires 0(y/n) queries, so one can use Grover's algorithm to 
search over the n 2 entries for an incorrect entry. Another way to obtain the same bound is to show 
that there exists a formula of size 0(n 3 ) that expresses the statement AB = C and then apply 
Theorem 1. 

While Buhrman and Spalek do not explicitly state a lower bound for this problem, their tech- 
niques yield a lower bound of Q(n) queries. This leaves a gap between the best known upper 
and lower bounds. The following theorem summarizes known facts about Boolean matrix product 
verification. 

Theorem 6. If A, B and C are n x n Boolean matrices available via oracles for their entries, 
checking whether the Boolean matrix product of A and B equals C, i.e., checking whether Cij = 
Vfc AikABkj for all i,j G [n], requires at least f2(n) quantum queries and at most 0(n 3 / 2 ) quantum 
queries. 

We use the results of the previous section to improve the lower bound to 0(n ). The first step 
is to show how the problem of evaluating depth-2 circuits relates to the Boolean matrix product 
verification problem. 

Let the Boolean vector product verification problem for a given nxn matrix A (bvpv^) be the 
problem of deciding if a given vector v satisfies Av = 1, where 1 is the all-ones vector of length n. 
Note that A is part of the specification of the problem and not part of the input. Let the Boolean 
function computed by this problem be denoted as bvpya(v), where v is a Boolean vector of size n. 

Since v is a vector of size n, the query complexity of this problem, Q(bvpva), is upper bounded 
by n. Observe that we can write bvpv^i;) as \/jAijVj, which is a monotone depth-2 circuit 
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with n OR gates, 1 and gate, and n input variables V{. More interestingly, every monotone depth-2 
circuit with n OR gates, 1 AND gate, and n input variables corresponds to bvpva for some matrix 
A. 

It follows that lower bounds for linear-sized depth-2 circuits also yield lower bounds for this 
problem. Prom Corollary 1, we know that there exists a depth-2 circuit with n gates that requires 
r2(n ' 555 ) queries to evaluate. Without loss of generality, we can assume (by Lemma 3) that the 
circuit is monotone and that its top gate is an and gate. Thus there exists a matrix A such that 
Q{bv¥V A ) = 0(n a555 ). 

The next lemma shows us how the bvpv problem is related to the bmpv problem. 

Lemma 5. For any n x n matrix A, Q(bmpv) = £l{y/n Q(bvpva)) 

Proof. We prove a lower bound for the special case of bmpv where C is the all ones matrix, J. In 
this case, checking whether AB = J is equivalent to checking whether Abi = 1 for all i where hi 
denotes the i th column of B. Indeed, we can think of this problem as n independent instances of 
the BVPVyi problem: the output of this special case of BMPV with the first matrix being A is 1 if 
and only if all n instances of bvpv^ output 1. In other words, the BMPV problem for a fixed A and 
C = J is just bvpv j 4(6i) A ... A BVPV J 4(6 n ) = and„ o (bvpv^, . . . , BVPV J 4)(6i, . . . , b n ). 

As in Section 3.1, we use Theorem 1.5 of [27] to conclude the quantum query complexity of the 
BMPV problem for a fixed A and C = J is fl(Q (AND n )Q(BVPV a))- Since the general BMPV problem 
can only be harder than this special case, we get the desired lower bound. □ 

Using this lemma and the lower bound obtained earlier, Q(bvpv^) = Q(n 5 ^ 9 ) = f2(ra ' 555 ), we 
get the main result of this section: 

Theorem 7. The bounded- error quantum query complexity of the Boolean matrix product verifica- 
tion problem is f2(n 19 / 18 ) = f2(n L055 ). 

5.2 Applications to classical circuit complexity 

In this section we present some classical applications of our results. In particular, we prove lower 
bounds on the number of gates needed in any formula representing certain functions. The main 
tool we use is the following corollary of Theorem 2. 

Corollary 3. For a function f with n inputs and quantum query complexity Q(f), any (unbounded- 
fanin) formula representing f requires Q(Q(f) 4 /?i 2 ) gates. 

Almost immediately, this implies that functions such as parity and majority, which have 
quantum query complexity of £l(n), require Q(n 2 ) gates to be represented as a formula. Similarly, 
this implies that functions with query complexity f2(n 3//4 ), such as GRAPH CONNECTIVITY [15], 
GRAPH PLANARITY [4], and HAMILTONIAN CYCLE [10], require formulas with fl(n) gates. 

To compare this with previous results, it is known that parity requires formulas of size 
il(n 2 ) [21]. This result is implied by our result since the number of gates is less than the size 
of a formula. 

We can also use these techniques to address the question "Given a constant-depth circuit of 
size G, how efficiently can this circuit be expressed as a formula?" The best result of this type that 
we are aware of is the following: There exists a constant-depth circuit of linear size such that any 
formula expressing the same function has size at least n 2 ~°^ . The result appears at the end of 
Section 6.2 in [20], where the function called V OR (x,y) is shown to have these properties. Indeed, 
the function has a depth-3 formula with 0(n) gates. The idea of using such functions, also called 
universal functions, is attributed to Nechiporuk [25]. 
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We construct an explicit constant-depth circuit of linear size that requires Q,(n 2 ~ e ) gates to be 
expressed as a formula. To the best of our knowledge, this result is new. Our result is incomparable 
to the previous result since we lower bound the number of gates, which also lower bounds the formula 
size, but we use a constant-depth circuit as opposed to a depth-3 formula. 

Improving our lower bound to £l(n 2 ) seems difficult, since we do not even know an explicit 
constant-depth circuit of linear size that requires formulas of size (as opposed to number of gates) 
il(n 2 ), which is a weaker statement. In fact, we do not know any explicit function in AC with a 
formula size lower bound of f2(n 2 ) (for more information, see [22]). 

Theorem 8. For every e > 0, there exists a constant- depth unbounded- fanin circuit (i.e., an AC 
circuit) of size 0(n) such that any (unbounded-fanin) formula computing the same function must 
have f2(n 2_<E ) gates. 

Proof. The aim is to construct a function in AC with linear size and nearly maximal quantum 
query complexity, and then to apply Corollary 3. We know that there exists a depth-3 function, 
ONTO, with query complexity £l(n/ logn) and size 0(n 2 / log n) (Proposition 2). Composing the 
onto function with itself using Lemma 4 gives a depth-5 circuit of smaller size with a similar 
query complexity lower bound. 

In general, if we have a function / with a circuit of depth k, size 0{n r ) for some 1 < r < 2, and 
query complexity Q(n/log c n), then we can construct a function /' with a circuit of depth 2k — 1, 
size 

0( n r 2 /(2r-l)) 5 and 

query complexity f2(ra/ log 2c n). This is achieved by composing the function 
/ on m inputs with / on u/m inputs to get a new function /' on n inputs. The size of the resulting 
circuit is 0(m r + n r /m r ~ 1 ). To make the two terms equal, we choose m 2r ~ 1 = n r , which gives the 
claimed size. The query complexity of /' is 0((m/log c m)((n/m)/log c (n/m))) = Q(n/ log 2c n). 

Now we can iterate this construction to get smaller circuits with about the same query complex- 
ity. Since the size of the circuit decreases from 0(n r ) to 0(n r /( 2r ~ 1 )), the circuit size approaches 
0(n) as the number of iterations increases. Clearly, for any 5 > 0, we can reach a circuit of size 
0(n 1+s ) with a constant number of iterations. The resulting function has constant depth and query 
complexity J7(n/log c n), for some constant c that depends on 5. Now we can define a new function 
g on n inputs that is this function acting on the first n l ^ 1+ ^ bits. Clearly this function has a 
linear-size circuit, and its query complexity is Q.{n l ^ 1+ ^ / log c n), which is f2(n 1-e ) for some e' > 0. 
Since 5 > could be chosen arbitrarily, we can achieve any e' > in this step. Corollary 3 now 
completes the proof. □ 

6 Conclusions and open problems 

We have given a tight characterization of the query complexity of read-many formulas in terms of 
their number of inputs n, formula size S, and gate count G. In particular, we showed that the query 
complexity of evaluating this class of formulas is 0(min{n, y/S, n 1 / 2 ^? 1 / 4 }). Our results suggest sev- 
eral new avenues of research, looking both toward refined characterizations of the query complexity 
of evaluating formulas and toward a better understanding of the quantum query complexity of 
evaluating circuits. 

In Section 4 we showed that our query complexity bounds are nearly tight for all formulas of 
a given depth except for the case of depth 2. We made partial progress on this remaining case by 
giving a depth-2 circuit of size n that requires ri(n ' 555 ) queries, whereas our algorithm gives an 
upper bound of O(n ' 75 ). 

We expect that it should be possible to improve the lower bound for linear-size depth-2 circuits. 
We have considered candidate depth-2 circuits, based on affine or projective planes, that seem 
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difficult to evaluate. Here we describe a circuit based on a projective plane, which is a set P of 
points and a set L of lines such that any two distinct points are on a unique line, any two distinct 
lines intersect at a unique point, and there exist four points with no three of them on the same 
line. A projective plane of order q consists of n = q 2 + q + 1 points and n lines, where each line 
contains q + 1 points and each point is on q + 1 lines. We write i 6 t to indicate that the point 
i € P is on the line t € L. Consider the depth-2 circuit 

VA 3 * ( 4 ) 

where X{ is a bit assigned to the point i € P. This circuit has n variables and n + 1 gates. Clearly, 
the 1-certificate complexity of this formula is q + 1; it can also be shown that when q is a square, 
the O-certificate complexity is q 3 ^ 2 + 1 (see for example [6] ) . Thus the certificate complexity barrier 
[28, 29] only rules out proving a better lower bound than f2(n 5 / 8 ) = f2(n 0,625 ) using the adversary 
method with positive weights. However, we are not aware of any lower bound better than f2(n 3 / 8 ), 
or any upper bound better than the 0(n 3//4 ) bound of Theorem 2. 

While we have made progress in understanding the quantum query complexity of evaluating 
read-many formulas, we would also like to understand the query complexity of evaluating general 
circuits. It would be interesting to find upper and lower bounds on the query complexity of 
evaluating circuits as a function of various parameters such as their number of inputs, gate count, 
fanout, and depth. In particular, the graph collision problem can also be expressed using a circuit 
of depth 3 and linear size (in addition to the naive depth-2 circuit of quadratic size mentioned in 
Section 4), so it would be interesting to focus on the special case of evaluating such circuits. 
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